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DISCUSSION OF: A STATISTICAL ANALYSIS OF MULTIPLE 
TEMPERATURE PROXIES: ARE RECONSTRUCTIONS OF 
SURFACE TEMPERATURES OVER THE LAST 
1000 YEARS RELIABLE? 1 

By Doug Nychka and Bo Li 
National Center for Atmospheric Research and Purdue University 

This article (MW) has stimulated much valuable discussion and helped to 
focus attention on an important area for the application of statistics. Given 
the short amount of space, however, we reluctantly comment only on the 
second and last sections. 

Excursions in the history of science. Although Section 2 of this paper 
is lively reading, we feel that the viewpoint is not balanced and emphasizes 
statistical correctness over the broader issues of scientific understanding. 
Recounting a controversy that has both a political dimension and involves 
scientific issues from several disciplines is perhaps better left to a historian 
of science. Wegman's quote on page 9 of the article is actually from a later 
written response to Representative Stupak, not from the original testimony 
[see Questions surrounding the hockey stick (2006)]. We encourage readers 
to also read the transcript of the congressional hearings and the contem- 
poraneous report by the National Academies, NRC (2006) to follow this 
debate. 

Paleoclimate reconstructions. The Wegman committee's original report 
stopped short of redoing the temperature reconstruction with Mann's data 
and with the correct centering of the principal components. Although this 
exercise was beyond the report's charge, it is sound statistical practice to 
evaluate changes in intermediate methodology by their influence on the final 
statistical inference. The string of references that are cited by MW on page 
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10 beginning with Mann and Rutherford (2002) established the robustness 
of the reconstruction with respect to centered verses noncentered methods if 
several PCs are included. This is a finding that might have been uncovered 
by the Wegman committee as well. In this context, we applaud MW for 
carrying through to a reconstruction to assess the impact of methodological 
choices. We term the model used in Section 5 a direct approach because 
it builds a predictive regression model for temperature directly from the 
proxies. To complement this article, we discuss an indirect approach that 
takes advantage of some current work in Bayesian statistics. 

A Bayesian hierarchical model (BHM). Although a direct approach may 
be useful for comparison with previous work, we hold that a BHM provides 
a better solution to the reconstruction problem. A BHM can be described 
as indirect in that one models the dependence of the proxies conditional on 
temperature. Bayes' theorem is then used to invert the relationship to arrive 
at a posterior predictive distribution of temperature given the data. We 
sketch this approach below using seminal ideas from Tingley and Huybers 
(2010) and some features from Li, Nychka and Ammann (2010). Let Tt be 
the true temperatures on a grid at time t and let Northern Hemisphere (NH) 
temperature, yt, be a linear combination of the Tt. A possible HBM for this 
problem is: 



Data level: Proxies 
Process level: Space-time process: 
NH mean process: 

Prior level: 



x t ,i = 7ihjT t + u t ,i 
T t = ytl + v t ; Vf = Av t _i + e t ;e t - iV(0,E) 

yt — a* + StuJs + Vtujy + Ctujc + wt 
[7,td,A,S,...] 



The data equation asserts that the ith proxy at time t is a linear com- 
bination of the true temperature field plus noise, hj is a known row vector 
of weights and 7, an unknown parameter to "calibrate" each proxy. The 
errors, Ut t i, for each proxy (i) may have autocorrelation but we will assume 
that between proxies the noise time series are independent — the goal is to 
explain correlation among proxies by the temperature field (or other geo- 
physical variables). At the Process level the temperature field evolves as 
a space-time process with the variation in the NH average prescribing its 
mean level. Here is assumed to be a first-order vector autoregressive pro- 
cess with A and S determining the spatial dependence. The NH mean level 
reflects the basic energy balance of the Earth's climate system. The external 
series of solar radiance (5), volcanic dust (V) and carbon dioxide concentra- 
tions (C) are large scale drivers of temperature. A low-order autoregressive 
process, wt, reflects additional interannual variation. Finally, the prior level 
favors diffuse priors on unknown statistical parameters but can also consoli- 
date information across many similar parameters. Specifically, priors for the 
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regression parameters, {ji} can borrow strength across proxies and control 
overfltting. Given this hierarchy, one samples the predictive distribution for 
yt using Bayes' theorem and Markov chain Monte Carlo. 

Benefits of the hierarchical approach. It is better to model how a proxy 
depends on observed climate rather than formulating a prediction model 
through a direct relationship. Climate scientists working on a particular 
proxy spend much effort in understanding and quantifying this forward re- 
lationship: conditional on the climate what would be the response of the 
proxy? Thus, the data level is a useful framework to incorporate their ex- 
pert knowledge into the reconstruction. The process level is attractive to 
geoscientists as well because it builds in constraints in the reconstruction 
that are reasonable and well accepted. One strategy for formulating this 
process model is to use the output from high resolution climate system 
models to identify the form of the vector autoregression (A) and the spatial 
correlations among the innovations (£). The BHM addresses missing and 
irregular proxy information and temperatures in a consistent way. The pos- 
terior distribution can be sampled when proxies are missing over different 
time periods and so a single statistical model is used to derive the recon- 
struction at all times. This is in contrast to the direct approach where one 
has to use a separate model for each different subset of proxies that are 
available for a given reconstruction period. 

The hierarchical model and the indirect approach avoid the problem of 
proxy centering that was first encountered by Mann. Direct approaches, even 
using the Lasso, can suffer from the attenuation effects caused by measure- 
ment errors in proxies [Ammann, Genton and Li (2010)]. If this effect is not 
corrected, the RMSE could be misleading. For example, the in-sample mean 
could have smaller RMSE than a biased reconstruction due to attenuation. 
However, the biased reconstruction could capture the basic structure of the 
temperature process while the in-sample mean contains no information. In 
contrast, the hierarchical models with the indirect approach are free of this 
concern. Overall, we believe that HBM are a success in transferring main- 
stream statistical ideas to a substantial application in the geosciences and 
we thank the authors again for initiating this discussion. 
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